Improving Data Driven Part-of-Speech Tagging by Morphologic Knowledge Induction
نویسنده
چکیده
We present a Markov part-of-speech tagger for which the P (w|t) emission probabilities of word w given tag t are replaced by a linear interpolation of tag emission probabilities given a list of representations of w. As word representations, string suffixes of w are cut off at the local maxima of the Normalized Backward Successor Variety. This procedure allows for the derivation of linguistically meaningful string suffixes that may relate to certain POS labels. Since no linguistic knowledge is needed, the procedure is language independent. Basic Markov model part-of-speech taggers are significantly outperformed by our model.
منابع مشابه
Towards Learning Error-Driven Transformations for Information Extraction
Transformation-based and error-driven induction of rules has proven to be a successful and applicable technique for Part-of-Speech tagging and other natural language processing tasks. This paper presents the ongoing work to create a universal algorithm for learning transformations in the domain of information extraction and semantic annotation. Such a method can be applied to different and usef...
متن کاملPrototype-Driven Learning for Sequence Models
We investigate prototype-driven learning for primarily unsupervised sequence modeling. Prior knowledge is specified declaratively, by providing a few canonical examples of each target annotation label. This sparse prototype information is then propagated across a corpus using distributional similarity features in a log-linear generative model. On part-of-speech induction in English and Chinese,...
متن کاملAn improved joint model: POS tagging and dependency parsing
Dependency parsing is a way of syntactic parsing and a natural language that automatically analyzes the dependency structure of sentences, and the input for each sentence creates a dependency graph. Part-Of-Speech (POS) tagging is a prerequisite for dependency parsing. Generally, dependency parsers do the POS tagging task along with dependency parsing in a pipeline mode. Unfortunately, in pipel...
متن کاملبرچسبگذاری ادات سخن زبان فارسی با استفاده از مدل شبکۀ فازی
Part of speech tagging (POS tagging) is an ongoing research in natural language processing (NLP) applications. The process of classifying words into their parts of speech and labeling them accordingly is known as part-of-speech tagging, POS-tagging, or simply tagging. Parts of speech are also known as word classes or lexical categories. The purpose of POS tagging is determining the grammatical ...
متن کاملPart of Speech Tagging in Thai Language Using Support Vector Machine
The elastic-input neuro tagger and hybrid tagger, combined with a neural network and Brill's error-driven learning, have already been proposed for the purpose of constructing a practical tagger using as little training data as possible. When a small Thai corpus is used for training, these taggers have tagging accuracies of 94.4% and 95.5% (accounting only for the ambiguous words in terms of the...
متن کامل